Conversion of Imbalanced Data Into A Stream Using SMOTE Algorithm
نویسندگان
چکیده
Machine learning approach has got major importance when distribution of data is unknown. Classification of data from the data set causes some problem when distribution of data is unknown. Characterization of raw data relates to whether the data can take on only discrete values or whether the data is continuous. In real world application data drawn from non-stationary distribution, causes the problem of “concept drift” or “non-stationary learning”. Drifting of dataset is often associated with online learning scenario. The goal of intelligent machine learning algorithms is to be able to address a wide spectrum of real world scenarios, then the need for a general framework for learning from, and adapting to, a nonstationary environment that may introduce imbalanced data can be hardly overstated. This paper focus on imbalanced data that results in unequal representation of classes in a pattern recognition problem. There are typically two types on class in an imbalanced pattern recognition problem, majority (negative) and minority (positive).
منابع مشابه
Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection
The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such...
متن کاملOversampling Method for Imbalanced Classification
Classification problem for imbalanced datasets is pervasive in a lot of data mining domains. Imbalanced classification has been a hot topic in the academic community. From data level to algorithm level, a lot of solutions have been proposed to tackle the problems resulted from imbalanced datasets. SMOTE is the most popular data-level method and a lot of derivations based on it are developed to ...
متن کاملGeometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE
Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In th...
متن کاملImproving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data
In this paper, we present a prototype selection technique for imbalanced data, Fuzzy Rough Imbalanced Prototype Selection (FRIPS), to improve the quality of the artificial instances generated by the Synthetic Minority Over-sampling TEchnique (SMOTE). Using fuzzy rough set theory, the noise level of each instance is measured, and instances for which the noise level exceeds a certain threshold le...
متن کاملSMOTE for Learning from Imbalanced Data: Progress and Challenges. Marking the 15-year Anniversary∗
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm has been established as a “de facto” standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, it has proven successful in a number of different applicati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013